<!DOCTYPE html>
<html lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
<title>Chapter 9 Datasets | Extract, Analyze and Visualize Mutational Signatures with Sigminer</title>
<meta name="author" content="Shixiang Wang, PhD (Sun Yat-sen University Cancer Center)">
<meta name="author" content="Xue-Song Liu, PhD (ShanghaiTech University)">
<meta name="description" content="9.1 Reference annotation sigminer stores many reference annotation datasets for internal calculation. It can be exported for other usage either by data() or get_genome_annotation(). Currently,...">
<meta name="generator" content="bookdown 0.28 with bs4_book()">
<meta property="og:title" content="Chapter 9 Datasets | Extract, Analyze and Visualize Mutational Signatures with Sigminer">
<meta property="og:type" content="book">
<meta property="og:description" content="9.1 Reference annotation sigminer stores many reference annotation datasets for internal calculation. It can be exported for other usage either by data() or get_genome_annotation(). Currently,...">
<meta name="twitter:card" content="summary">
<meta name="twitter:title" content="Chapter 9 Datasets | Extract, Analyze and Visualize Mutational Signatures with Sigminer">
<meta name="twitter:description" content="9.1 Reference annotation sigminer stores many reference annotation datasets for internal calculation. It can be exported for other usage either by data() or get_genome_annotation(). Currently,...">
<!-- JS --><script src="https://cdnjs.cloudflare.com/ajax/libs/clipboard.js/2.0.6/clipboard.min.js" integrity="sha256-inc5kl9MA1hkeYUt+EC3BhlIgyp/2jDIyBLS6k3UxPI=" crossorigin="anonymous"></script><script src="https://cdnjs.cloudflare.com/ajax/libs/fuse.js/6.4.6/fuse.js" integrity="sha512-zv6Ywkjyktsohkbp9bb45V6tEMoWhzFzXis+LrMehmJZZSys19Yxf1dopHx7WzIKxr5tK2dVcYmaCk2uqdjF4A==" crossorigin="anonymous"></script><script src="https://kit.fontawesome.com/6ecbd6c532.js" crossorigin="anonymous"></script><script src="libs/jquery-3.6.0/jquery-3.6.0.min.js"></script><meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
<link href="libs/bootstrap-4.6.0/bootstrap.min.css" rel="stylesheet">
<script src="libs/bootstrap-4.6.0/bootstrap.bundle.min.js"></script><script src="libs/bs3compat-0.4.0/transition.js"></script><script src="libs/bs3compat-0.4.0/tabs.js"></script><script src="libs/bs3compat-0.4.0/bs3compat.js"></script><link href="libs/bs4_book-1.0.0/bs4_book.css" rel="stylesheet">
<script src="libs/bs4_book-1.0.0/bs4_book.js"></script><script src="libs/htmlwidgets-1.5.4/htmlwidgets.js"></script><link href="libs/datatables-css-0.0.0/datatables-crosstalk.css" rel="stylesheet">
<script src="libs/datatables-binding-0.24/datatables.js"></script><link href="libs/dt-core-1.11.3/css/jquery.dataTables.min.css" rel="stylesheet">
<link href="libs/dt-core-1.11.3/css/jquery.dataTables.extra.css" rel="stylesheet">
<script src="libs/dt-core-1.11.3/js/jquery.dataTables.min.js"></script><link href="libs/crosstalk-1.2.0/css/crosstalk.min.css" rel="stylesheet">
<script src="libs/crosstalk-1.2.0/js/crosstalk.min.js"></script><script src="https://cdnjs.cloudflare.com/ajax/libs/autocomplete.js/0.38.0/autocomplete.jquery.min.js" integrity="sha512-GU9ayf+66Xx2TmpxqJpliWbT5PiGYxpaG8rfnBEk1LL8l1KGkRShhngwdXK1UgqhAzWpZHSiYPc09/NwDQIGyg==" crossorigin="anonymous"></script><script src="https://cdnjs.cloudflare.com/ajax/libs/mark.js/8.11.1/mark.min.js" integrity="sha512-5CYOlHXGh6QpOFA/TeTylKLWfB3ftPsde7AnmhuitiTX4K5SqCLBeKro6sPS8ilsz1Q4NRx3v8Ko2IBiszzdww==" crossorigin="anonymous"></script><!-- CSS --><style type="text/css">
    
    div.hanging-indent{margin-left: 1.5em; text-indent: -1.5em;}
  </style>
<style type="text/css">
    /* Used with Pandoc 2.11+ new --citeproc when CSL is used */
    div.csl-bib-body { }
    div.csl-entry {
      clear: both;
        }
    .hanging div.csl-entry {
      margin-left:2em;
      text-indent:-2em;
    }
    div.csl-left-margin {
      min-width:2em;
      float:left;
    }
    div.csl-right-inline {
      margin-left:2em;
      padding-left:1em;
    }
    div.csl-indent {
      margin-left: 2em;
    }
  </style>
</head>
<body data-spy="scroll" data-target="#toc">

<div class="container-fluid">
<div class="row">
  <header class="col-sm-12 col-lg-3 sidebar sidebar-book"><a class="sr-only sr-only-focusable" href="#content">Skip to main content</a>

    <div class="d-flex align-items-start justify-content-between">
      <h1>
        <a href="index.html" title="">Extract, Analyze and Visualize Mutational Signatures with Sigminer</a>
      </h1>
      <button class="btn btn-outline-primary d-lg-none ml-2 mt-1" type="button" data-toggle="collapse" data-target="#main-nav" aria-expanded="true" aria-controls="main-nav"><i class="fas fa-bars"></i><span class="sr-only">Show table of contents</span></button>
    </div>

    <div id="main-nav" class="collapse-lg">
      <form role="search">
        <input id="search" class="form-control" type="search" placeholder="Search" aria-label="Search">
</form>

      <nav aria-label="Table of contents"><h2>Table of contents</h2>
        <ul class="book-toc list-unstyled">
<li><a class="" href="index.html">📖 Introduction</a></li>
<li class="book-part">Part I: Background and Prerequisite</li>
<li><a class="" href="mutsig-intro.html"><span class="header-section-number">1</span> Mutational signatures</a></li>
<li><a class="" href="prerequisite.html"><span class="header-section-number">2</span> Package prerequisite and installation</a></li>
<li class="book-part">Part II: Workflows</li>
<li><a class="" href="basic-workflow.html"><span class="header-section-number">3</span> Mutational signature analysis basics</a></li>
<li><a class="" href="analysis-supps.html"><span class="header-section-number">4</span> Other signature types</a></li>
<li><a class="" href="target-vis.html"><span class="header-section-number">5</span> Target visualization</a></li>
<li class="book-part">Part III: Miscellaneous topics</li>
<li><a class="" href="universal-analysis.html"><span class="header-section-number">6</span> Universal analysis</a></li>
<li><a class="" href="subtype-prediction.html"><span class="header-section-number">7</span> Subtype prediction</a></li>
<li><a class="" href="sigflow.html"><span class="header-section-number">8</span> Sigflow pipeline</a></li>
<li><a class="active" href="datasets.html"><span class="header-section-number">9</span> Datasets</a></li>
<li><a class="" href="convert.html"><span class="header-section-number">10</span> SBS signature conversion</a></li>
<li class="book-part">Appendix</li>
<li><a class="" href="references.html">References</a></li>
</ul>

        <div class="book-extra">
          
        </div>
      </nav>
</div>
  </header><main class="col-sm-12 col-md-9 col-lg-7" id="content"><div id="datasets" class="section level1" number="9">
<h1>
<span class="header-section-number">9</span> Datasets<a class="anchor" aria-label="anchor" href="#datasets"><i class="fas fa-link"></i></a>
</h1>
<div id="reference-annotation" class="section level2" number="9.1">
<h2>
<span class="header-section-number">9.1</span> Reference annotation<a class="anchor" aria-label="anchor" href="#reference-annotation"><i class="fas fa-link"></i></a>
</h2>
<p><strong>sigminer</strong> stores many reference annotation datasets for internal calculation. It can be exported for other usage either by <code><a href="https://rdrr.io/r/utils/data.html">data()</a></code> or <code><a href="https://rdrr.io/pkg/sigminer/man/get_genome_annotation.html">get_genome_annotation()</a></code>.</p>
<p>Currently, there are the following datasets:</p>
<ul>
<li><code>centromeres.hg19</code></li>
<li><code>centromeres.hg38</code></li>
<li><code>chromsize.hg19</code></li>
<li><code>chromsize.hg38</code></li>
<li><code>cytobands.hg19</code></li>
<li><code>cytobands.hg38</code></li>
</ul>
<p>An example is given as below:</p>
<div class="sourceCode" id="cb93"><pre class="downlit sourceCode r">
<code class="sourceCode R"><span><span class="fu"><a href="https://rdrr.io/r/utils/data.html">data</a></span><span class="op">(</span><span class="st">"centromeres.hg19"</span><span class="op">)</span></span>
<span><span class="fu"><a href="https://rdrr.io/r/utils/head.html">head</a></span><span class="op">(</span><span class="va">centromeres.hg19</span><span class="op">)</span></span>
<span><span class="co">##   chrom left.base right.base</span></span>
<span><span class="co">## 1  chr1 121535434  124535434</span></span>
<span><span class="co">## 2  chr2  92326171   95326171</span></span>
<span><span class="co">## 3  chr3  90504854   93504854</span></span>
<span><span class="co">## 4  chr4  49660117   52660117</span></span>
<span><span class="co">## 5  chr5  46405641   49405641</span></span>
<span><span class="co">## 6  chr6  58830166   61830166</span></span></code></pre></div>
<p><code><a href="https://rdrr.io/pkg/sigminer/man/get_genome_annotation.html">get_genome_annotation()</a></code> can better control the returned <code>data.frame</code>.</p>
<div class="sourceCode" id="cb94"><pre class="downlit sourceCode r">
<code class="sourceCode R"><span><span class="fu"><a href="https://rdrr.io/pkg/sigminer/man/get_genome_annotation.html">get_genome_annotation</a></span><span class="op">(</span></span>
<span>  data_type <span class="op">=</span> <span class="st">"chr_size"</span>,</span>
<span>  chrs <span class="op">=</span> <span class="fu"><a href="https://rdrr.io/r/base/c.html">c</a></span><span class="op">(</span><span class="st">"chr1"</span>, <span class="st">"chr10"</span>, <span class="st">"chr20"</span><span class="op">)</span>,</span>
<span>  genome_build <span class="op">=</span> <span class="st">"hg19"</span></span>
<span><span class="op">)</span></span>
<span><span class="co">##   chrom      size</span></span>
<span><span class="co">## 1  chr1 249250621</span></span>
<span><span class="co">## 2 chr10 135534747</span></span>
<span><span class="co">## 3 chr20  63025520</span></span></code></pre></div>
<p>More see <code><a href="https://rdrr.io/pkg/sigminer/man/get_genome_annotation.html">?get_genome_annotation</a></code>.</p>
</div>
<div id="copy-number-components-setting" class="section level2" number="9.2">
<h2>
<span class="header-section-number">9.2</span> Copy number components setting<a class="anchor" aria-label="anchor" href="#copy-number-components-setting"><i class="fas fa-link"></i></a>
</h2>
<p>Dataset <code>CN.features</code> is a predefined component data table for identifying copy number signatures by method “Wang”.
Users can define a custom table with similar structure and pass it to function like <code><a href="https://rdrr.io/pkg/sigminer/man/sig_tally.html">sig_tally()</a></code>.</p>
<p>Detail about how to generate this dataset can be viewed at <a href="https://github.com/ShixiangWang/sigminer/blob/master/data-raw/CN-features.R" class="uri">https://github.com/ShixiangWang/sigminer/blob/master/data-raw/CN-features.R</a>.</p>
<div class="sourceCode" id="cb95"><pre class="downlit sourceCode r">
<code class="sourceCode R"><span><span class="va">CN.features</span></span>
<span><span class="co">##     feature         component label  min max</span></span>
<span><span class="co">##  1:  BP10MB         BP10MB[0] point    0   0</span></span>
<span><span class="co">##  2:  BP10MB         BP10MB[1] point    1   1</span></span>
<span><span class="co">##  3:  BP10MB         BP10MB[2] point    2   2</span></span>
<span><span class="co">##  4:  BP10MB         BP10MB[3] point    3   3</span></span>
<span><span class="co">##  5:  BP10MB         BP10MB[4] point    4   4</span></span>
<span><span class="co">##  6:  BP10MB         BP10MB[5] point    5   5</span></span>
<span><span class="co">##  7:  BP10MB        BP10MB[&gt;5] range    5 Inf</span></span>
<span><span class="co">##  8:   BPArm          BPArm[0] point    0   0</span></span>
<span><span class="co">##  9:   BPArm          BPArm[1] point    1   1</span></span>
<span><span class="co">## 10:   BPArm          BPArm[2] point    2   2</span></span>
<span><span class="co">## 11:   BPArm          BPArm[3] point    3   3</span></span>
<span><span class="co">## 12:   BPArm          BPArm[4] point    4   4</span></span>
<span><span class="co">## 13:   BPArm          BPArm[5] point    5   5</span></span>
<span><span class="co">## 14:   BPArm          BPArm[6] point    6   6</span></span>
<span><span class="co">## 15:   BPArm          BPArm[7] point    7   7</span></span>
<span><span class="co">## 16:   BPArm          BPArm[8] point    8   8</span></span>
<span><span class="co">## 17:   BPArm          BPArm[9] point    9   9</span></span>
<span><span class="co">## 18:   BPArm         BPArm[10] point   10  10</span></span>
<span><span class="co">## 19:   BPArm BPArm[&gt;10 &amp; &lt;=20] range   10  20</span></span>
<span><span class="co">## 20:   BPArm BPArm[&gt;20 &amp; &lt;=30] range   20  30</span></span>
<span><span class="co">## 21:   BPArm        BPArm[&gt;30] range   30 Inf</span></span>
<span><span class="co">## 22:      CN             CN[0] point    0   0</span></span>
<span><span class="co">## 23:      CN             CN[1] point    1   1</span></span>
<span><span class="co">## 24:      CN             CN[2] point    2   2</span></span>
<span><span class="co">## 25:      CN             CN[3] point    3   3</span></span>
<span><span class="co">## 26:      CN             CN[4] point    4   4</span></span>
<span><span class="co">## 27:      CN      CN[&gt;4 &amp; &lt;=8] range    4   8</span></span>
<span><span class="co">## 28:      CN            CN[&gt;8] range    8 Inf</span></span>
<span><span class="co">## 29:    CNCP           CNCP[0] point    0   0</span></span>
<span><span class="co">## 30:    CNCP           CNCP[1] point    1   1</span></span>
<span><span class="co">## 31:    CNCP           CNCP[2] point    2   2</span></span>
<span><span class="co">## 32:    CNCP           CNCP[3] point    3   3</span></span>
<span><span class="co">## 33:    CNCP           CNCP[4] point    4   4</span></span>
<span><span class="co">## 34:    CNCP    CNCP[&gt;4 &amp; &lt;=8] range    4   8</span></span>
<span><span class="co">## 35:    CNCP          CNCP[&gt;8] range    8 Inf</span></span>
<span><span class="co">## 36:    OsCN           OsCN[0] point    0   0</span></span>
<span><span class="co">## 37:    OsCN           OsCN[1] point    1   1</span></span>
<span><span class="co">## 38:    OsCN           OsCN[2] point    2   2</span></span>
<span><span class="co">## 39:    OsCN           OsCN[3] point    3   3</span></span>
<span><span class="co">## 40:    OsCN           OsCN[4] point    4   4</span></span>
<span><span class="co">## 41:    OsCN   OsCN[&gt;4 &amp; &lt;=10] range    4  10</span></span>
<span><span class="co">## 42:    OsCN         OsCN[&gt;10] range   10 Inf</span></span>
<span><span class="co">## 43:      SS           SS[&lt;=2] range -Inf   2</span></span>
<span><span class="co">## 44:      SS      SS[&gt;2 &amp; &lt;=3] range    2   3</span></span>
<span><span class="co">## 45:      SS      SS[&gt;3 &amp; &lt;=4] range    3   4</span></span>
<span><span class="co">## 46:      SS      SS[&gt;4 &amp; &lt;=5] range    4   5</span></span>
<span><span class="co">## 47:      SS      SS[&gt;5 &amp; &lt;=6] range    5   6</span></span>
<span><span class="co">## 48:      SS      SS[&gt;6 &amp; &lt;=7] range    6   7</span></span>
<span><span class="co">## 49:      SS      SS[&gt;7 &amp; &lt;=8] range    7   8</span></span>
<span><span class="co">## 50:      SS            SS[&gt;8] range    8 Inf</span></span>
<span><span class="co">## 51:    NC50         NC50[&lt;=2] range -Inf   2</span></span>
<span><span class="co">## 52:    NC50           NC50[3] point    3   3</span></span>
<span><span class="co">## 53:    NC50           NC50[4] point    4   4</span></span>
<span><span class="co">## 54:    NC50           NC50[5] point    5   5</span></span>
<span><span class="co">## 55:    NC50           NC50[6] point    6   6</span></span>
<span><span class="co">## 56:    NC50           NC50[7] point    7   7</span></span>
<span><span class="co">## 57:    NC50          NC50[&gt;7] range    7 Inf</span></span>
<span><span class="co">## 58:   BoChr          BoChr[1] point    1   1</span></span>
<span><span class="co">## 59:   BoChr          BoChr[2] point    2   2</span></span>
<span><span class="co">## 60:   BoChr          BoChr[3] point    3   3</span></span>
<span><span class="co">## 61:   BoChr          BoChr[4] point    4   4</span></span>
<span><span class="co">## 62:   BoChr          BoChr[5] point    5   5</span></span>
<span><span class="co">## 63:   BoChr          BoChr[6] point    6   6</span></span>
<span><span class="co">## 64:   BoChr          BoChr[7] point    7   7</span></span>
<span><span class="co">## 65:   BoChr          BoChr[8] point    8   8</span></span>
<span><span class="co">## 66:   BoChr          BoChr[9] point    9   9</span></span>
<span><span class="co">## 67:   BoChr         BoChr[10] point   10  10</span></span>
<span><span class="co">## 68:   BoChr         BoChr[11] point   11  11</span></span>
<span><span class="co">## 69:   BoChr         BoChr[12] point   12  12</span></span>
<span><span class="co">## 70:   BoChr         BoChr[13] point   13  13</span></span>
<span><span class="co">## 71:   BoChr         BoChr[14] point   14  14</span></span>
<span><span class="co">## 72:   BoChr         BoChr[15] point   15  15</span></span>
<span><span class="co">## 73:   BoChr         BoChr[16] point   16  16</span></span>
<span><span class="co">## 74:   BoChr         BoChr[17] point   17  17</span></span>
<span><span class="co">## 75:   BoChr         BoChr[18] point   18  18</span></span>
<span><span class="co">## 76:   BoChr         BoChr[19] point   19  19</span></span>
<span><span class="co">## 77:   BoChr         BoChr[20] point   20  20</span></span>
<span><span class="co">## 78:   BoChr         BoChr[21] point   21  21</span></span>
<span><span class="co">## 79:   BoChr         BoChr[22] point   22  22</span></span>
<span><span class="co">## 80:   BoChr         BoChr[23] point   23  23</span></span>
<span><span class="co">##     feature         component label  min max</span></span></code></pre></div>
</div>
</div>
  <div class="chapter-nav">
<div class="prev"><a href="sigflow.html"><span class="header-section-number">8</span> Sigflow pipeline</a></div>
<div class="next"><a href="convert.html"><span class="header-section-number">10</span> SBS signature conversion</a></div>
</div></main><div class="col-md-3 col-lg-2 d-none d-md-block sidebar sidebar-chapter">
    <nav id="toc" data-toggle="toc" aria-label="On this page"><h2>On this page</h2>
      <ul class="nav navbar-nav">
<li><a class="nav-link" href="#datasets"><span class="header-section-number">9</span> Datasets</a></li>
<li><a class="nav-link" href="#reference-annotation"><span class="header-section-number">9.1</span> Reference annotation</a></li>
<li><a class="nav-link" href="#copy-number-components-setting"><span class="header-section-number">9.2</span> Copy number components setting</a></li>
</ul>

      <div class="book-extra">
        <ul class="list-unstyled">
          
        </ul>
</div>
    </nav>
</div>

</div>
</div> <!-- .container -->

<footer class="bg-primary text-light mt-5"><div class="container"><div class="row">

  <div class="col-12 col-md-6 mt-3">
    <p>"<strong>Extract, Analyze and Visualize Mutational Signatures with Sigminer</strong>" was written by Shixiang Wang, PhD (Sun Yat-sen University Cancer Center), Xue-Song Liu, PhD (ShanghaiTech University). It was last built on 2022-08-29.</p>
  </div>

  <div class="col-12 col-md-6 mt-3">
    <p>This book was built by the <a class="text-light" href="https://bookdown.org">bookdown</a> R package.</p>
  </div>

</div></div>
</footer><!-- dynamically load mathjax for compatibility with self-contained --><script>
  (function () {
    var script = document.createElement("script");
    script.type = "text/javascript";
    var src = "true";
    if (src === "" || src === "true") src = "https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.9/latest.js?config=TeX-MML-AM_CHTML";
    if (location.protocol !== "file:")
      if (/^https?:/.test(src))
        src = src.replace(/^https?:/, '');
    script.src = src;
    document.getElementsByTagName("head")[0].appendChild(script);
  })();
</script><script type="text/x-mathjax-config">const popovers = document.querySelectorAll('a.footnote-ref[data-toggle="popover"]');
for (let popover of popovers) {
  const div = document.createElement('div');
  div.setAttribute('style', 'position: absolute; top: 0, left:0; width:0, height:0, overflow: hidden; visibility: hidden;');
  div.innerHTML = popover.getAttribute('data-content');

  var has_math = div.querySelector("span.math");
  if (has_math) {
    document.body.appendChild(div);
    MathJax.Hub.Queue(["Typeset", MathJax.Hub, div]);
    MathJax.Hub.Queue(function() {
      popover.setAttribute('data-content', div.innerHTML);
      document.body.removeChild(div);
    })
  }
}
</script>
</body>
</html>
